A Unified Approach to Minimum Risk Training and Decoding
نویسندگان
چکیده
We present a unified approach to performing minimum risk training and minimum Bayes risk (MBR) decoding with BLEU in a phrase-based model. Key to our approach is the use of a Gibbs sampler that allows us to explore the entire probability distribution and maintain a strict probabilistic formulation across the pipeline. We also describe a new sampling algorithm called corpus sampling which allows us at training time to use BLEU instead of an approximation thereof. Our approach is theoretically sound and gives better (up to +0.6%BLEU) and more stable results than the standard MERT optimization algorithm. By comparing our approach to lattice MBR, we are also able to gain crucial insights about both methods.
منابع مشابه
Hypergraph Training and Decoding of System Combination in SMT
Tranditional n-best based training and decoding method of system combination can propogate the error because of imprecision parameter estimation and too early prunning. In order to alleviate the problem, the paper proposes hypergraph (HG) based three-pass training and three-pass decoding for different features. In order to construct HG, this paper introduces simplified bracket transduction gram...
متن کاملMinimum hypothesis phone error as a decoding method for speech recognition
In this paper we show how methods for approximating phone error as normally used for Minimum Phone Error (MPE) discriminative training, can be used instead as a decoding criterion for lattice rescoring. This is an alternative to Confusion Networks (CN) which are commonly used in speech recognition. The standard (Maximum A Posteriori) decoding approach is a Minimum Bayes Risk estimate with respe...
متن کاملDiscriminative training for segmental minimum Bayes risk decoding
A modeling approach is presented that incorporates discriminative training procedures within segmental Minimum Bayes-Risk decoding (SMBR). SMBR is used to segment lattices produced by a general automatic speech recognition (ASR) system into sequences of separate decision problems involving small sets of confusable words. Acoustic models specialized to discriminate between the competing words in...
متن کاملMinimum Bayes-risk System Combination
We present minimum Bayes-risk system combination, a method that integrates consensus decoding and system combination into a unified multi-system minimum Bayes-risk (MBR) technique. Unlike other MBR methods that re-rank translations of a single SMT system, MBR system combination uses the MBR decision rule and a linear combination of the component systems’ probability distributions to search for ...
متن کاملProbabilistic inference for phrase-based machine translation : a sampling approach
Recent advances in statistical machine translation (SMT) have used dynamic programming (DP) based beam search methods for approximate inference within probabilistic translation models. Despite their success, these methods compromise the probabilistic interpretation of the underlying model thus limiting the application of probabilistically defined decision rules during training and decoding. As ...
متن کامل